Lost in Translations? Building Sentiment Lexicons using Context Based Machine Translation
نویسندگان
چکیده
In this paper, we propose a simple yet efective approach to automatically building sentiment lexicons from English sentiment lexicons using publicly available online machine translation services. The method does not rely on any semantic resources or bilingual dictionaries, and can be applied to many languages. We propose to overcome the low coverage problem through putting each English sentiment word into diferent contexts to generate diferent phrases, which efectively prompts the machine translation engine to return diferent translations for the same English sentiment word. Experiment results on building a Chinese sentiment lexicon (available at https://github.com/fannix/ChineseSentiment-Lexicon) show that the proposed approach signiicantly improves the coverage of the sentiment lexicon while achieving relatively high precision.
منابع مشابه
Sentiment Lexicons for Arabic Social Media
Existing Arabic sentiment lexicons have low coverage—only a few thousand entries. In this paper, we present several large sentiment lexicons that were automatically generated using two different methods: (1) by using distant supervision techniques on Arabic tweets, and (2) by translating English sentiment lexicons into Arabic using a freely available statistical machine translation system. We c...
متن کاملBuilding sentiment Lexicons applying graph theory on information from three Norwegian thesauruses
Sentiment lexicons are the most used tool to automatically predict sentiment in text. To the best of our knowledge, there exist no openly available sentiment lexicons for the Norwegian language. Thus in this paper we applied two different strategies to automatically generate sentiment lexicons for the Norwegian language. The first strategy used machine translation to translate an English sentim...
متن کاملEvaluation of Context-Dependent Phrasal Translation Lexicons for Statistical Machine Translation
We present new direct data analysis showing that dynamically-built context-dependent phrasal translation lexicons are more useful resources for phrase-based statistical machine translation (SMT) than conventional static phrasal translation lexicons, which ignore all contextual information. After several years of surprising negative results, recent work suggests that context-dependent phrasal tr...
متن کاملHow Translation Alters Sentiment
Sentiment analysis research has predominantly been on English texts. Thus there exist many sentiment resources for English, but less so for other languages. Approaches to improve sentiment analysis in a resource-poor focus language include: (a) translate the focus language text into a resource-rich language such as English, and apply a powerful English sentiment analysis system on the text, and...
متن کاملBuilding a robust sentiment lexicon with (almost) no resource
Creating sentiment polarity lexicons is labor intensive. Automatically translating them from resourceful languages requires in-domain machine translation systems, which rely on large quantities of bi-texts. In this paper, we propose to replace machine translation by transferring words from the lexicon through word embeddings aligned across languages with a simple linear transform. The approach ...
متن کامل